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(57) An image is automaticaily assessed with re- 
spect to certain features, wherein the assessment is a 
determination of the degree of importance, interest or 
attractiveness of the Image. First, a digital image is ob- 
tained corresponding to' the image. Then one or more 
quantities are computed that are related to one or more 
features in the digital image, including one or more fea- 
tures pertaining to the content of the digital image. The. 
quantities are processed with a reasoning algorithm that 
is trained on the opinions of one or more human observ- 



ers, and an output is obtained from the reasoning algo- 
rithm that assesses the Image. More specifically, the 
reasoning algorithm Is a Bayesian network that provides 
a score which, when done for a group of Images, selects 
one image as the emphasis Image. The features per- 
taining to the content of the digital Image include people- 
related features and/or subject-related features. More- 
over, additional quantities may be computed that relate 
to objective measures of the digital image, such as 
coiorf ulness and/or sharpness. 
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Description 

FIELD OF THE INVENTION 
BACKGROUND OF THE INVENTION 

[0002] Image assessment and understanding deal with problems that are easily solved by human beings alven their 

ciTSZtinn Z T ""P"^"^ photographic applications include main subject SetecUo^Tcene 

classification , sky and grass detection, people detection, automatic detection of orientation eto in a vari«h/ of «nn^i 
cafons that deal wrth a group of pictures, it is Important to rank the images In ter^s oftSgL^^^^^ 

where a group of digital images are automatically organized into digital photo albums. This involves clusterina the 

.^PB^te events and then laying out each event In some logteal order, if possible Srorir S^^ 
irS^^vTrTe^ '''^^ - the belief thaf some Images ZX^te 

[0003] A number of known algorithms, such as dud detection, event detection and page layout algorithms are useful 
m connection wtth automatfc albuming applications. Dud detection addresses the eMnatlon or HSste of du 
plicate Images and poor quality images, while event detection Involves the clustering oHmaqes into sTo^ate ;^^^^^^^^^ 

Of o'^aTrav.'^T'; T'"; '""^K ^ ^ °^ '-^9- that belong! ^sfme t^^^^^^^^ 

of page layout is to layout each event in some logical and pleasing presentation, e g., to find the mos Dtea^lna and 
space-efffclent presentation of the Images on each page. It would be desirable to be at^irtrselecUhrri^tt^!^rn. 

Zfi'" T T":^ °' ^'^''"'^ --t attention int^^age iayo t ^ 

[0004J Due to the nature of the Image assessment problem, i.e., that an automated system Is exoected to a«n»r»to 
results that are representative of high-level cognitive human (Understanding) proc^^ d^s W Jn^s^^T^ 

co"^iooid ""^ ^"^^ ^" ^''^ ^-^^^ ^e aid of an opS-lr to detSne 

correspondence of visual features to sensitive language that is displayed for use by the operator The d£urrthis 

J^a^^dto^^a^c^rt^^^^^^^^ 

^^tl ^ ^ ■ ? "^"^^^ description is difficult is to use for relative ranking of images. The -945 patert 
discloses a system for evaluating the psychological effect of text and graphics in a docurnem. TTifJawLack wKe 
^h rX!f T rr'""'? °' '""^ ^°*="'"«"t, without regard to fts ^^SlJt 
ment oTf os JJhn."„nf "T ?:,^^^^'°P'"9 '"^'^tive ranking. Besides their complexity and orientation towa S 
t^«n „n thf^ ^ effect, these systems focus on the analysis and creation of a perceptual impression^er 
than on the assessment and utilization of an existing Image. « ....pression ramer 

SUMMARY OF THE INVENTION 

KTacTr^.nlft^foli'''^"*'"? H'^'^"^ '° overcoming one ormore of the problems setforth above. Briefly summa- 
nzed according to one aspect of the present invention, a method is disclosed for assessing an image with ^oeSTo 
certain features, wherein the assessment is a determination of the degree of Importance InteresTor att^ii^^^^ 
t'halrf rel^f ""^'""^ corresponding to the ima^e. T?e!:oTo"r mo f^^^^^^^^ 

Sfhf Hin^T '^^'^ '""^ or more features pertaining toihe content 

mo» ™ K^''- "'^ ^"^^'^ « ^3°^'" th«t is trained o^ the oj^ls of onl o 

aspect of the invention, the features pertaining to the content of the digital image include at least one of peoDi^reial^ 
features and subject-reteted features. Moreover, additional quantities may be computed thaf rSe toTne^^^^ 
objective measures of the digital image, such as colorfulness or sharpness 

fnfr^ w ^"°t^«^,^«Pect. the invention may be seen as assessing the emphasis or appeal of an image as here- 
- cied such as '''''' image featr^ are 

clc^:SVi edl'tetl^r °' °' ^""""^ °^ ""^^ ^"'^ 
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b. Objective features: the colorfulness and.sharpness of the- image. 

c. Subject related features: the size of main subject and the goodness of composition based on main subject 
mapping. 

Whiie the above-noted features are adequate for emphasis assessment, It is preferable that certain additional relative- 
salient image features are considered for appeal assessment, such as: 

a. The representative value of each image In temis of color content. 

b. The uniqueness of the picture aspect format of each image. 

C^[0007] An assessment of an image is obtained with a reasoning engine, such as a Bayesian network, which accepts 
^ as Input the above-noted features and is trained to generate Image assessment values. This assessment may be an 
intrinsic assessment for individual images, in which case the self-salient features are processed by a Bayesian network 
^ trained to generate the image appeal values, or the assessment may be a relative assessment for a group of images, 
in which case the self-salient and, optionally, the relative-salient features are processed by a Bayesian network trained 
'^to generate image emphasis values. 

[0008] The advantage of the invention lies in Its ability to perform an assessment of one or more images without 
human intervention. In a variety of applications that deal with a group of pictures, such as automatic albumlng, such 
an algorithmic assessment enables the automatic ranking of images in terms of their logical order, so that they can be 
20 more efficiently processed or treated according to their order. 

[0009] These and other aspects, objects, features and advantages of the present invention will be more clearly 
understood and appreciated from a review of the following detailed description of the preferred embodiments and 
appended claims, and by reference to the accompanying drawings. 



25 BRIEF DESCRIPTION OF THE DRAWINGS 

[0010] FIG. 1 is a block diagram of a network for calculating an emphasis value for an image. 

[0011] FIG. 2 is a block diagram of a network for calculating an appeal value for an image. 

[0012] FIG. 3 is a block diagram showing in more detail the components of main subject detection as shown in 
30 Figures 1 and 2. 

[0013] FIG. 4 is a block diagram of a network architecture for calculating the relative emphasis values of a group of 
images. 

[0014] FIGS. 5A- 5D are detailed diagrams of the component methods shown In Figure 3 for main subject detection. 

[0015] FIG. 6 Is a detailed diagram of a method for determining the colorfulness of an Image. 

35 [0016] FIG. 7 is a diagram of chromaticlty plane wedges that are used for the colorfulness feature computation, 

[0017] FIG. 8 is a block diagram of a method for skin and face detection. 

[0018] FIG. 9 Is a detailed block diagram of main subject detection as shown in Figure 5. 

[001 9] FIG. 1 0 Is a diagram of a two level Bayesian net used In the networks shown In Figures 1 and 2. 

[0020] FIG. 1 1 Is a perspective diagram of a computer system for practicing the invention set forth in the preceding 
40 figures. 



DETAILED DESCRIPTION OF THE INVENTION 



[0021] In the following description, a preferred embodiment of the present invention will be described as a software 
program. Those skilled in the art will readily recognize that the equivalent of such software may also be constructed 
in hardware. Because Image processing algorithms and systems are well known, the present description will be directed 
in particular to algorithms and systems fonning part of, or cooperating more directly with, the method in accordance 
with the present Invention. Other aspects of such algorithms and systems, and hardware and/or software for producing 
and othenwise processing the Image signals Involved therewith, not specifically shown or described herein may be 
selected from such systems, algorithms, components and elements thereof known in the art. Given the description as 
set forth in the following specification, all software implementation thereof as a computer program is conventional and 
within the ordinary skill in such arts. 

[0022] Still further, as used herein, the computer program may be stored In a computer readable storage medium, 
which may comprise, for example; magnetic storage media such as a magnetic disk (such as a floppy disk) or magnetic 
tape; optical storage media such as an optical disc, optical tape, or machine readable bar code; solid state electronic 
storage devices such as random access memory {HAM), or read only memory (ROM); or any other physical device or 
medium employed to store a computer program. 

[0023] in a variety of applications that deal with a group of pictures, it is important to rank the Images in terms of 
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mio^d hpw TTl «° tt'ey be processed or treated according to these values. As 

mentioned before a photographic application currently of Interest is automatic albuming. where a group of digital im- 
.^""'"T «^'9t«' P'^°to albums. This Involves clustering the images Into separate evente 

of ?hrt™'i^lT "^'^^^ '"^y "^^^^ "P°" ^° related assessmeJte 

«LLtl^ Image appeal and image emphasis. Image appeal Is the Intrinsic degree of Importance, interest or 
attractiveness of an Individual picture; image emphasis, on the other hand, is the rei^e importance, in ere^[^ i 
tractiveness of the picture with respect to other pictures in the event or group 

°"°fK ^T?'!!®"^ '* ^ '^^"^'"^ ^° ^'^^ Important ""age in the group of 

detectton algorithm would output groups of images, where all images in a group belong to the same event The as 
sessment algorithm would be expected to operate on the images of each event and assign assesTr^ent values (i e 

emphasis and/orappeaivaiues)toeach image. Theassessmentvaiuesmaybeviewedasmetadata that are associaied 
wrth every image in a particular group and may be exploited by other algorithms, in such a proposed syrm^Tpage 
layout algorrthm wouW take as input the relative assessment values of all images In each event, so ?hatThe images 

of pictures .n an event are usedbyapage layout algorithm to assign relative size and location of the pictures on album 

tK..?„T^)'!« P''°P°^^<^ «y«^«"^ ^^^'^ -"any ^P^n questions about the type of system architecture and 
the selection of effective features for evaluation. An architecture that has been successfully applied to other Imaae 

fea1ur."l"'t,'™' r " °" " « classificatlo!; stage. vltth^pSo 

o^r.ff f 'f "ftf^^^t^ an ensemble of features. For this there are two likely ^proaches. iS 

approach is thatthereBnogoodjustifkationfortheselection of features.Asecondapproach is to basefeatureselectto^ 
aZSt^'^l'^w through controlled experiments. Since the Inventor found no such experlmenron rec^ a 
wH^ ^ study was conducted to obtain data that would point to meaningful features. The results of the ground 
truth study are used for feature selection and for training the classifier. ^ 

^^""'^ ^' ^" ""^^^ emphasis network 10 for computing an emphasis value is shown to 
comprise two stages: a feature extraction stage 12 and a classification stage 14. The feature extracti<^n steoe 12 

m™lf"rf 7 '° ^'^'"^ ""^Se fe^re characteristic. wte^J quS^e 

measure of the feature is expressed by the value of the output of the algorithm. The outputs of the feature extr^n 

ntegrated by he classification stage 14to compute an emphasis value. This value may. e.g., range fmm 0 to 1 00 and 
Lorl^n'Sd or belief that the processed image is the emphasis image. After the emphas'is values hale been 

computed for a group of images in separate image emphasis networics 10.1, 10.2...10.N. as shown in Figure 4 the 
emphasis values are compared in a comparator stage 16 and nomia.ized in respective 'nomializ^on S^16.l! 
rnr^-;;' a "^f 9^ *e highest emphasis value is chosen as the emphasis image forthe group. 

[0M7] An ensemble of features was selected for the feature extraction stage 1 2 on the basis of ground truth studies 

olin t^H If T '"""^^^ '^'^"'^ '° "^^9^ ''"^"^ ^ shaipness contrast 7m 

gram and exposure, although one or more of these traditional metrics may continue to have value in the caiculatiin rf 
an assessment value. The selected features may be generally divided into three categories: (a) features relSed to 
KurfJ rT"" ^-^J"*^ '"^'^ ^'"'J^- f-tures related to objective measures of the ^age R^Sng 

T'^ T^^" « "l^'" ^^^^ 20. a close-up detector 22 and a p2,ple 

detector 24. The input image i is typically processed through a skin detector 26 and a face detector 28 to generate 
ntermediate values suitable for processing by the people-related feature detectors 20. 22 and 24. "me features reiat^ 
to the main subject are extracted by a composition detector 30 and a subject size detector 32. based on input from a 
main subject detector ^. The composition detector 30 is composed of several composition- related main su^era? 

than Tlli^l compactness algorithm 30.3. The main subject data is clustered in a clustering stage 31 and 

then provKledto the composition-related algorithms 30.2 and 30.3 and to the subject size algorithm 32 The features 

!T T ? ^''''"°"* ^" ""j^^^ ""^^^"'^ '^'^^^^ representative the color content of an 

image is relative to a group of images is extracted by a representative color detector 42 

is defined as the degree of relative importance, interest or attractiveness of an image with respect to other imaqes^n 
a group. Sirice each image must be evaluated In relation to other Images in a group, the image emp^lte cSation 
thus embodlesanetwori< of image emphasis networics10.1.lo.2....10.N,suchLshown in RgurHwSs^^^^^^^ 
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Images as to their respective emphasis values. In practice, there may be but one image emphasis network 10, which 
is repeatedly engaged to determine the Image emphasis value of a series of images; In this case, the sequentially 
obtained results could be stored In an intermediate storage (not shown) for input to the comparator 1 6. The feature 
ensemble shown in Figure 2, which is a subset of the feature ensemble shown In Figure 1 , is used to calculate a value 

5 representative of image appeal, which is defined as the intrinsic degree of Importance, interest or attractiveness of an 
Image In an absolute sense, that Is, without reference to other Images. The features shown in Figure 2 are thus referred 
to as seif-sallent features, inasmuch as these features can stand on their own as an assessment of an image. In 
comparison, two additional features are detected in Figure 1, namely, the uniqueforrnat feature and the representative 
color feature; these are refen-ed to as relative-salient features, Inasmuch as these features are measurements that 

10 necessarily relate to other images. (These features, however, are optional Insofar as a satisfactory measure of em- 
phasis can be obtained from the seif-salient features alone.) Consequently, an assessment of both appeal and em- 
phasis involve seif-salient features, while only an assessment of emphasis may involve relative-salient features. 
[0029] The extraction of the feature ensembles according to Figures 1 and 2 Involves the computation of correspond- 
ing feature quantities, as set forth below. 
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Objective Features 



[0030] Objective features are the easiest to compute and provide the most consistent results in comparison to other 
types of features. Methods for computing them have been available for some time, and a large art of imaging science 
20 Is based on such measures. Although a large number of objective features could potentially be computed, only color- 
fulness and sharpness are considered for purposes of both image emphasis and appeal (Figures 1 and 2), and addi- 
tionally unique fomnat and representative color for purposes of Image emphasis (Figure 1 ). Other objective measures, 
such as contrast and noise, may be found useful in certain situations and are intended to be IncI uded within the coverage 
of this Invention. 



Coiorfuiness 

[0031] The coiorfuiness detector 38 provides a quantitative measure of coiorfuiness based on the observation that 
colorful pictures have colors that display high saturation at various hues. This was detennined in ground truth studies 
by examining for the presence of high saturation colors along various hues. The assumption of sRGB color space was 
made with respect to the image data. In particular, and as shown in Rgure 6, the coiorfuiness detector 38 implemente 
the following steps for computing coiorfuiness. Initially, In step 60, the input image values i are transformed to a lumi- 
nance/chrominance space. While many such transfomnatlons are known to the skilled person and may be used with 
success in connection with the invention, the preferred transfomiation is perfonned according to the following expres- 
sions: 

Neutral =(i5t|±^ 



Green-Magenta = { ^^'^'^ 



Illumination = 



where.neutral Is a measure of luminance, and green-magenta and illumination are a measure of chrominance. In step 
62, the chrominance plane (illumination, green-magenta) is divided and quantized into twelve chromaticity plane wedg- 
es, as shown in Figure 7, which are referred to as angular bins. Next, in step 64, each pixel is associated with one of 
the angular bins if its chrominance component lies within the bounds of that bin. The level of saturation (which is the 
distance from origin) is calculated in step 66 for each pixel In each angular bin. The number of high saturation pixels 
that populate each angular bin are then measured in step 68, where a high saturation pixel is one whose distance from 
the origin in the chrominance plane is above a certain threshold T^. (e.g., Tc=0.33). For each angular bin, the bin is 
determined to be active in step 70 if the number of high saturation pixels exceeds a certain threshold T^ (e.g., Tg=250 
pixels). Coiorfuiness is then calculated in step 72 according to the following expression: ^ 
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Colorfulness = mirij ^^^^^^'^^ , ,.o| 

5 

Note that this definition of colorfulness assumes that if 10 out of the 12 bins are populated, colorfulness is considered 
to be 1.0 and the image is most colorful. 

Sharpness 

10 

[00321 The sharpness detector 36 implements the following steps to find sharpness features In the image: 

a) The Image is cropped at a 20% level alongthe border and converted to grayscale by extracting the green channel- 
's to iduce notee detected in the green channel using a Sobel operator after running a 3x3 averaging filter 

1^ ^^^nn* M ^'^ '^3'°"^ ^^"^^ strongest edges are identified as those that are 

above the 90* percentile of the edge histogram; 

d) The strongest-edge regions are refined through median Altering, and the statistics of the strongest edges are 
computed; and 

^0 e) The average of the strongest edges provides an estimate of sharpness. 

S/274 e'lStnSrH^.r^^^^^^ for sharpness detection may be found In commonly assigned U.S. Serial No. 

fnlH M^^^^^ , ^ '^"tomaticaily Detecting Digital Images that are Undesirable for Placing in Albums" 
25 ence "^""^^ ^"^'^^ ^^""^^ ^""^ Alexander Loui. and which Is incorporated herein by refer- 

Fonnat Uniqueness 

[0033] Participants in the ground truth experiment indicated that pictures talcen In APS "panoramic" mode are more 
deserving of emphasis Preliminary analysis of the ground truth data indicated that if a picture was the on/ypanoramic 

p.cture,nagroup.th.sfactincreases its likelihood of beingseiected as theemphasis image. The relativefeat^^^^ 
uniqueness" represents this property, letnure rormat 

[0034] The unique fomriat detector 40 implements the following algorithm for each image / in the group, in which the 
format f is based on the long and short pixel dimensions s„ of the image: a P. wnicn me 
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C, /,yj,< 1.625, 

H, 1.625 </,/s,<2.25, 

P, 2.25^ 



Then format uniqueness U is 



[O, otherwise. 



^ Representative Color 

[0035] The representative color detector 42 implements the foliowing steps to determine how representative the 

color of an image is: 



1 . For each Image /. compute the color histogram /j/fl, G. B) (in RGB or Luminance/Chrominance space) 

2. Find the average color histogram for the group by averaging all of the Image histograms as follows: 
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/-I 

3. For each Image /, compute the distance between the histogram of the Image and the average color histogram 
(Euclidian or Histogram intersection distance), as follows: 



B) = lg|A,(/?,G.B) ~ A^{R. G. B)\ 
4. Find the maximum of the distances computed in 3, as follows:. 



5. The representative measure ris obtained by dividing each of the distances with the maximum distance {can 
20 vary from 0 to 1 ), as follows: 



People- Related Features 



[0036] People related features are important in determining image emphasis, but many of the positive attributes that 
are related to people are difficult to compute, e.g. people smiling, people facing camera, etc. Sl<ln detection methods 
50 allow the computation of some people-related features such as: whether people are present, the magnitude of the skin 
area, and the amount of closeup. 



Skin and Face Detection 



[0037] The skin detection method that is used by the skin detector 26, and the face detection method that is used 
by the face detector 28, is based on the method disclosed in commonly assigned patent application Serial No. 
09/112,661 entitled "A Method for Detecting Human Faces in Digitized Images" which was filed July 9, 1998 in the 
names of H.C. Lee and H. Nteponski., and which Is incorporated herein by reference. 

[0038] Referring to Fig. 8, an overview is shown of the method disclosed in Serial No, 09/1 1 2.661 . The Input images 
are color balanced to compensate for predominant global illumination in step SI 02, which involves conversion from (n 
g,b) values to (L,sJ) values. In the (L,sJ) space, the L axis represents the brightness of a color, while the s and faxes 
are chromatic axes. The s component approximately represents the illuminant variations from dayllghtto tungsten light, 
from blue to red. The t component represents an axis between green and magenta. A number of well-known color 
balancing algorithms may be used for this step, including a simple method of averaging-to-gray. Next, a k-mode clus- 
tering algorithm is used for color segmentation in step SI 04. A disclosure of this algorithm is contained In commonly 
assigned U.S. Patent 5,418,895, which is Incorporated herein by reference. Basically, a 3-D color histogram In (L,s,t) 
space Is formed from the input color image and processed by the clustering algorithm. The result of this step is a region 
map with each connected region having a unique label. For each region, the averaged luminance and chromaticity are 
computed in step S106. These features are used to predict possible skin regions (candidate skin regions) based on 
conditional probability and adaptive thresholding. Estimates of the scale and in-plane rotational pose of each skin 
region are then made by fitting a best ellipse to each skin region in step S108. Using a range of scales and in-plane 
rotational pose around these estimates, a series of linear filtering steps are applied to each facial region in step S11 0 
for identifying tentative facial features . A number of probability metrics are used in step S1 12 to predict the likelihood 
that the region actually represents a facial feature and the type of feature it represents. 

[0039] Features that pass the previous screening step are used as initial features in a step S1 1 4 for a proposed face. 
Using projective geometry,. the identification of the three initial features defines the possible range of poses of the head. 
Each possible potential face pose, in conjunction with a generic threedimensional head model and ranges of variation 
of the position of the facial features, can be used to predict the location of the remaining facial features. The list of 
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candidate facial features can then be searched to see If the predicted features were located. The proximftv of a can- 

*° ^ P^e^l'cted location and orientation affects the probabilistic estimate of the validity of that feature 
[00401 A Bayeslan network probabilistic model of the head Is used in a step 8116 to Interpret the accumulated evi- 
dence of the presence of a face. The prior probabilities of the network are extracted from a large set of training Images 
wtth heads in various orientations and scales. The network Is Initiated with the proposed features of the candidate face 
with their estimated probabilities based on computed metrics and spatial conformity to the template Tlie network 
then executed with these initial conditions until It converges to a global estimate of the probability of face presence 
This probabNity can be compared against a hard threshold or left in probabilistic form when a binary assessment Is noi 
needed. Further details of this sWn and face detection method may be found in Serial No. 09/112.661 , which Is incor- 
porated herein by reference. 

Skin Area 

[0041 ] The percentage of skln^aoe area in a picture is computed by the skin area detector 20 on its own merit and 
also as a preliminary step to people detection and close-up detection. Consequently, the output of the skin area delator 
20 IS connected to the classification stage 14 and also Input to the close-up detector 22 and the people detector 24 
Skin area IS a continuous variable between 0 and 1 and correlates to a number of features related to people For 
example, for pictures taken from the same distance, increasing skin area indteates that there are more people In the 
picture and correlates with the positive Indicator of "whole group In photo." Alternatively, if two pictures contain the 
same number of people, larger skin area may Indicate larger magnification, which correlates with the positive attribute 
of closeup. Other explanations for larger skin area are also possible due to subject positioning. 

Close-up 

[0042] The close-up detector 22 employs the following measure for determining ctose-up: 

a) skin detection is perfomied and tiie resulting map is examined at the central region (25% from border)- and 

b) close-up Is detenmined as the percentage of skin area at the central portion of the image. 

In some cases, face detection would be more appropriate than skin detection for determining close-up. 
People Present 

in the Image. The percentage of skinpixels In the image Is computed and people are assumed present when the skin 

percentage«aboveathresholdT,numberofpIxeis (e.g,,T,= 20 pixels). People present isablnary feature 
the presence or absence of people for 1 or 0 respectively 

Composition features 

[0044] Good composition is a wery important positive attribute of picture emphasis and bad composition is the most 
commoner mentioned negative attribute. Automatic evaluation of the composition of an image Is very difficult and some- 
times subjective. Good composition may follow a number of general well-known rules, such as the rule of thirds but 
these mies are often violated to express the photographer's perspective. o'wiros.Dui 

Main Subject Detection 

[0045] The algorithm used by the main subject detector 34 is disclosed In commonly assigned patent application 
Senal No. 09/223 860 entrtied "IVlethod for Automatic Detemiination of Main Subjects in Consumer LgesTCD^ 
cember 31. 1998 in the names of J. Luo. S. Etz and A. Singhal. Referring to Fig. 9. there is shown a block diagram of 
an overview of the mam subject detection method disclosed In Serial No. 09/223,860. First, an input image of a natural 
scene Is acquired and stored In step S200 in a digital fomi. Then, the image is segmented in step 8202 Into a few 
regions of homogeneous properties. Next, the region segments are grouped into larger regions in step 8204 based 
onsimilaritymeasuresthroughnon-purposlveperceptualgrouping,andfurthergrouped In step S206intolarger regions 

ohlI^^^^°TH'"^ '° '^^"9*^ 9~"P'"9 (P^fPos've greuping concerns specific 

Ob ects). The regions are evaluated in step 8208 fortheir sallency using two Independent yet complementary types of 
saiiency features - stioictural sallency features and semantte sallency features. The structural saliency features includ- 
ing a set of tew-level early vision features and a set of geometric features, are extracted in step S208a which are 
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further processed to generate a set of self-sallency features and a set of relative saiiency features. Semantic saliency 
features in tlie forms of Icey subject matters, wliich are likely to be part of either foreground (for example, people) or 
background (for example, sky, grass), are detected in step S208b to provide semantte cues as well as scene context 
cues. The evidences of both types are Integrated in step S21 0 using a reasoning engine based on a Bayes net to yield 
the final belief map step S21 2 of the main subject. 

[0046] To the end of semantic interpretation of images, a single criterion is clearly insufficient. The human brain, 
furnished with its a priori knowledge and enonnous memory of real world subjects and scenarios, combines different 
subjective criteria in order to give an assessment of the Interesting or primary subject(s) in a scene. The following 
extensive list of features are believed to have influences on the human brain In performing such a somewhat intangible 
task as main subject detection: location, size, brightness, colorfulness, texturefulness, key subject matter, shape, sym- 
metry, spatial relationship (sun-ouhdedness/occlusion), bordemess, indoor/outdoor, orientation, depth (when applica- 
ble), and motion (when applicable for video sequence). 

[0047] The iow-ievei early vision features include color, brightness, and texture. The geometric features include lo- 
cation (centrality), spatial relationship (bordemess, adjacency, surroundedness, and occlusion), size, shape, and sym- 
metry. The semantic features Include skin, face, sky, grass, and other green vegetation. Those skilled In the art can 
define more features without departing from the scope of the present Invention. l\4ore details of the main subject de- 
tection algorithm are provided in Serial No. 09/223,860, which is Incorporated herein by reference. 
[0048] The aforementioned version of the main subject detection algorithm is computationally intensive and alterna- 
tive versions may be used that base subject detection on a smaller set of subject-related features. Since all of the 
composition measures considered here are with respect to the main subject belief map, it is feasible to concentrate 
the system on the most computationally effective aspects of these measures, such as aspects bearing mostly on the 
"centrality" measure. These aspects are considered in Judging the main subject, thereby reducing the overall compu- 
tational complexity at the expense of some accuracy. It Is a useful property of the Bayesian Network used in the main 
subject detection algorithm that features can be excluded In this way without requiring the algorithm to be retrained. 
Secondly, it takes advantage of the fact that Images supplied to main subject detector 50 are known to be oriented 
right-side-up. The subject-related features associated with spatial location of a region within the scene can be modified 
to reflect this knowledge. For example, without knowing scene orientation the main subject detector 50 assumes a 
center-weighted distribution of main subject regions, but with known orientation a bottom-centenveighted distribution 
may be assumed. 

[0049] Referring to Figure 3, after the main subject belief map has been computed In the main subject detector 50, 
it is segmented In a clustering stage 31 into three regions using k-means clustering of the intensity values. The three 
regions correspond to pixels that have high probability of being part of the main subject, pixels that have low probability 
of being part of the main subject, and intermediate pixels. Based on the quantized map, the features of main subject 
size, centraPrty, compactness, and interest (variance) are computed as described below in reference to Figs, 5A- SD. 

Main Subject Variance 

[0050] One way to characterize the contents of a photograph Is by how interesting it is. For the purpose of emphasis 
Image selection, an image with the following characteristics might be considered Interesting. 

• the main subject is interesting in and of itself, by virtue of its placement in the frame. 

• the main subject constitutes a reasonably large area of the picture, but not the entire frame. 

• the background does not Include isolated objects that can distract from the main subject. 

An estimate of the interest level of each image is computed by estimating the variance in the main subject map. This 
feature Is primarily valuable as a counterindlcator: that is, uninteresting images should not be the emphasis image. In 
particular, and as shown In Rgure 5A, the main subject variance detector 30.1 implements the following steps for 
computing main subject variance. Initially, in step SI 0, the statistical variance v of all main subject belief map values 
is computed. In step SI 2, the main subject variance feature y Is computed according to the fonnula: 

y = min(1 .2.5*sqrt(v)/1 27.5) 

Main subject centrality 

[0051] The main subject centrality is computed as the distance between the image center and the centrold of the 
high probability (and optionally the Intennedlate probability) region(s) in the quantized main subject belief map. In 
particular, and as shown In Figure 68, the main subject centrality detector 30.2 Implements the following steps for 
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comporting ma n subj^t centrallty. Initially. In step S20. the pixel coordinates of the centrold of the highest-valued cluster 
s located n step S22. the Euclidean distance j from the center of the Image to the centrold is computed In ?ep S24 

T ^ " ^ '^'^'^'"9 ^ """"'^^ °f P'^«'« «'°"g the Shortest side of the fmage In 

step S26. the main subject centrality feature m Is computed according to the formula- 



m = min(l<,1) 

Main subject size 



'°°!^L, J!'® *f * f ^ '"''^^ determined by the size of the high probability (and optionally the intemiedlate 

ZTS^^J:^^ '? """"'h^ '""J"" """^^ "^P- ^^P^^^^^'^ the perceLge S the^n^a? ar^a 
(25 ^ from borter) that is occupied by the high (and optionally the intemiediate) probability region. In particular and 
as shown n Figure 5C, the main subject size detector 32 Implements the following steps for comp Jing maS^sJbSS 

le^Xi^C:::' """l". °' •'"'^'^ ^^^^ hlghest-valued cluster and tSe^S^gS 

centrall4 of the image area is counted. In step S32. the main subject size feature g is computed according to theformula- 



where N Is the total number of image pixels. 
Main Sublect Compactness 



[0053] The compactness of the main subject Is estimated by computing a bounding rectangle forthe high orobabllltv 
(and optionally the intemiedlate probability) reglon(s) In the quantized main subject belief map and then examiniS 
the percentege o the bounding rectangle that Is occupied by the main subject. In particular, andTs shownl^^S^^^^^^ 

ness. InltelV. In step S40. the number of pixels a In the highest-valued cluster is counted. In step S42 the smalSt 
t^«T'r ?r '1''' ^"^^ ^" P*"^" highest-valued cluster (the bounding box) Is computed, ar^d in st^TS 



e = mln(1. max(0. 2*(a/b-0^))) 
where e will be a value between 0 and 1 , Inclusive. 



Classification Stage 



S A 111! r^u? ^rd\Bg to the algorithms set forth above are applied to the classification 

stage14.whlch.spr6ferablyarBasonlngenglnethat accepts as Inputtheself-salientand/orther^ 

and IS trained to generate Image assessment (emphasis and appeal) values. Different evidences may compete or 

Itlonrof r^LT "^.r^^. ^^^"'^^ °' 9^°""^ °f human obse'r^erT'^alu- 

ations Of real Images. Competition and reinforcement are resolved by the Inference network of the reasoning enafne 
A preferred reasoning engine Is a Bayes network. tuning engine. 

E^nJ ^Q«m 1*^' ^T' f"^' ^r^"^' P'^^"'^«^f^^^^<>nlngfn Intelligent Systems. San Francisco. OA: Morgan 
Kaufmann. 1988) is a directed acyclic graph that represents causality relationships between various entities In the 
SJ?^''^ he direction of links represents causality relationships between various entities in the graph, and where 
hedirection of linl« representsoausallty. Evaluation is basedon knowledge of theJointProbabiH^ 
(PDF^ among various entities. The Bayes net advantages include explicit uncertainty characterizarn eSent c^m- 
puteWon. easy construction and maintenance, quick training, and fast adaptation to changes in the ne^vork structure 
and Its parameters. A Bayes net consists of four components: "eiworK siruciurs 

• Priors: The initial beliefs about various nodes in the Bayes net 

' SXl^n"^?^""^ ^^^""^ ^^'^ '*"°w'edfle about the relationship between two connected nodes 
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• Evidences: Observations from feature detectors that are Input to the Bayes net. 

• Posteriors: The final computed beliefs after the evidences have been propagated through the Bayes net. 

[0056] Themostimportantcomponentfcrtralningisthesetof CRMs, shown as CPM stages 15.1 ... 15.9 In Figure 
5 1 (and 15.1 ... 15.7 In Figure 2) because they represent domain knowledge for the particular application at hand. While 
the derivation of CRMs is familiar to a person skilled in using reasoning engines such as a Bayes net, the derivation 
of an exemplary CRM will be considered later in this description. 

[0057] Referring to Figures 1 and 2, a simple two-level Bayes net Is used in the current system, where the emphasis 
(or appeal) score is detemnined at the root node and all the feature detectors are at the leaf nodes. It should be noted 
10 that each link is assumed to be conditionally Independent of other links at the same level, which results In convenient 
training of the entire net by training each link separately, i.e., deriving the CRM for a given link Independent of others. 
This assumption is often violated In practice; however, the Independence simplification makes Implementation feasible 
and produces reasonable results. It also provides a baseline for comparison with other classifiers or reasoning engines. 

IS Probabilistic Reasoning 

[0058] Ail the features are integrated by a Bayes net to yield the emphasis or appeal value. On one hand, different 
evidences may compete with or contradict each other. On the other hand, different evidences may mutually reinforce 
each other according to prior models or knowledge of typical photographic scenes. Both competition and reinforcement 

20 are resolved by the Bayes net-based Inference engine. 

[0059] Refening to Fig. 1 0, a two-level Bayeslan net Is used in the present Invention that assumes conditional inde- 
pendence between various feature detectors. The emphasis or appeal value Is detennined at the root node 44 and all 
the feature detectors are at the leaf nodes 46. There Is one Bayes net active for each image. It Is to be understood 
that the present Invention can be used with a Bayes net that has more than two levels without departing from the scope 

25 of the present Invention. 

Training Bayes nets 

[0060] One advantage of Bayes nets is each link is assumed to be independent of links at the same level. Therefore, 
30 It is convenient for training the entire net by training each link separately, I.e., deriving the CRM 15.1... 1 5.9 for a given 
link independent of others, in general, two nriethods are used for obtaining CPM for each root-feature node pain 

1 , Using Expert Knowledge 

35 [0061 1 This is an ad-hoc method. An expert Is consulted to obtain the conditional probabilities of each feature detector 
producing a high or low output given a highly appealing image. 

2. Using Contingency Tables 

40 [0062] This is a sampling and correlation method. Multiple observations of each feature detector are recorded along 
with Infonnatlon about the emphasis or appeal. These observations are then compiled together to create contingency 
tables which, when normalized, can then be used as the CPM 15,1... 15.9. This method Is similar to neural networi< 
type of training (learning). This method Is prefenred in the present invention. 

[0063] Consider the CPM for an arbitrary feature as an example. This matrix was generated using contingency tables 
45 derived from the ground tmth and the feature detector. Since the feature detector In general does not supply a binary 
decision {referring to Table 1), fractional frequency count is used in deriving the CPM. The entries in the CPM are 
detemnined by 



so 



55 



CPM = 



T 



(14) 
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10 



Pj = 



Je/reJi, 



IS 



20 



23 



30 



35 



where I is the set of all training image groups. R, is the set of all images in group i, n, is the number of observations 
(observers) for group i. Moreover, F, represents an M-label feature vector for image r, T, represents an L-level ground- 
truth vector and P denotes an L x L diagonal matrix of nomialization constant factors. For example, in Table 1 .images 

I±f I T f f °°' ' ^ • ° ^"^ °^ "^^^'^ 2' '^^P^^^^- Note that all the belief values have bL 

norniaiized by the proper belief ser^ors. As an Intuitive interpretation of the first column of the CPM for centralltv an 
Image with a high feature value is about twice as likely to be highly appealing than not .. . * 

Table 1: 





An example of training the CPM. 


Image Number 


Ground Truth 


Feature Detector Output 


Contribution 


1 


0 


0.017 


00 


2 


0 


0.211 


00 


3 


0 


- 0.011 


00 


4 


0.933 


0.953 


11 


5 


0 


0.673 


10 


6 


1 


0.891 


11 


7 


0.93 


0.072 


01 


8 


1 


0.091 


01 



Table 2: 



40 



45 



so 



The trained CPM. 




Feature = 1 


feature = 0 


Emphasis or Appeal = 1 
Emphasis or Appeal = 0 


0.35(11) 
0.17(10) 


0,65 (01) 
0.83 (00) 



[0064] While ttie invention has been describedfor use with a Bayes net. different reasoning engines may be employed 
in place of the Bayes net. For example, in Pattern RecoonM on and Neural l^etworks by B.D. Ripley (Cambridge Uni- 
versity Press 1 996). a variety of different classifiers are described that can be used to solve pattern ^eoogniS prob- 
lems, where having the right feature is nomially the most important consideration. Such classifiers include linear dis- 
cnminant a,,alys s methods, flexible discriminants, (feed-forward) neural networks, non-parametric methoSs tret 
structured class.f,ers. and belief networks (such as Bayesian networks). It will be obvious to anyone of ordinal sWll 
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Computer System 



[0065] In describing the present im/entlon. It should be apparent that the present Invention Is preferably utilized on 

in detail herein I Is also Instructive to note that the Images are either directly Input into the computer system (f7r 
example byadigltal camera) or dgltized before Input into the computer system (for example by scanning a^^^^^^^^ 
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such as a silver halide film). 

[0066] Referring to Fig. 11. there is illustrated a computer system 110 for implementing the present Invention. Al- 
though the computer system 1 1 0ls shown for the purpose of Illustrating a preferred embodiment, the present invention 
is not limited to the computer system 110 shown, but may be used on any electronic processing system. The computer 

s system 110 includes a microprocessor-based unit 112 for receiving and processing software programs and for per- 
fonning other processing functions. A display 114 Is electrically connected to the microprocessor-based unit 112 for 
displaying user-related Information associated with the software, e.g., by means of a graphical user interface. A Icey- 
board 116 Is also connected to the microprocessor based unit 112 for permitting a user to Input Information to the 
software. As an altemative to using the keyboard 1 1 6 for Input, a mouse 118 may be used for moving a selector 120 

10 on the display 114 and for selecting an item on which the selector 120 overlays, as Is well known In the art. 

[0067] A compact disk-read only memory (CD-ROM) 22 is connected to the microprocessor based unit 112 for re- 
ceiving software programs and for providing a means of inputting the software programs and other information to the 
microprocessor based unit 112 via a compact disk 124, which typically Includes a software program. In accordance 
with the invention, this software program could include the image assessment program described herein, as well as 

15 programs that utilize its output, such as the automatic albuming program. In addition, a floppy disk 126 may also Include 
the software program, and Is Inserted into the microprocessor-based unit 112 for Inputting the software program. Still 
further, the microprocessor-based unit 112 may be programmed, as is well known in the art, for storing the software 
program Intern aiiy. The microprocessor-based unit 112 may also have a networi< connection 127, such as a telephone 
line, to an external networi<, such as a local area network or the Internet. The program could thus stored on a remote 

50 server and accessed therefrom, or downloaded as needed. A printer 128 Is connected to the microprocessor-based 
unit 12 for printing a hardcopy of the output of the computer system 110. 

[006B] Images may also be displayed on the display 114 via a personal computer card (PC card) 130, such as, as 
It was fomnerly known, a PCMCIA card (based on the specifications of the Personal Computer Memory Card Intema- 
tional Association) whteh contains digitized Images electronically embodied In the card 130. The PC card 130 Is ultl- 

2s mately Inserted Into the microprocessor based unit 1 1 2 for pennitting visual, display of the Image on the display 1 1 4. 
Images may also be Input via the compact disk 124, the floppy disk 126, or the network connection 127, Any Images 
stored in the PC card 130, the floppy disk 126 or the compact disk 124, or Input through the networt< connection 127, 
may have been obtained from a variety of sources, such as a digital camera (not shown) or a scanner (not shown). 
[0069] The subject matter of the present Invention relates to digital Image understanding technology, which Is un- 

30 derstood to mean technology that digitally processes a digital image to recognize and thereby assign useful meaning 
to human understandable objects, attributes or conditions and then to utilize the results obtained in the further process- 
ing of the digital image. 

[0070] The Invention has been described with reference to a preferred embodiment. However, it will be appreciated 
that variations and modifications can be effected by a person of ordinary skill In the art without departing from the scope 
35 of the invention. While the Invention has been described from time to time in connection with automatic albuming, it 
should be clear that there are many other uses for the invention, in particular any kind of image processing applications 
where an image needs to be evaluated for some process based on its relative or intrinsic value. 

40 Claims 

1 . A method for assessing an image with respect to certain features, wherein said assessment is a detemiination of 
the degree of Importance, Interest or attractiveness of an image, said method comprising the steps of: 

45 (a) obtaining a digital image corresponding to the image; 

(b) computing one or more quantities related to one or more features in the digital Image, Including one or 
more features pertaining to the content of the digital Image; 

(c) processing said one or more quantities with a reasoning algorithm that is trained on the opinions of one or 
more human observers; and 

so (d) providing an output from the reasoning algorithm that assesses the Image. 

2. The method as claimed in claim 1 wherein the features pertaining to the content of the digital image Include at 
least one of people-related features and subject-related features. 

55 3. The method as claimed In claim 1 wherein step (b) further includes computing one or more quantities related to 
one or more objective features pertaining to objective measures of the digital image. 

4. The method as claimed in claim 3 wherein the objective features Include at least one of colorfulness and sharpness. 
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5. The method as claimed In claim 1 wherein the reasoning algorithm In step (c) Is trained at least In part from ground 

truth studies of candidate images. 

6. The method as claimed In claim 1 wherein the reasoning algorithm is a Bayeslan network. 

7. The method as claimed in claim 1 wherein step (d) provides an output from the reasoning algorithm that scores 
the assessment of the image. 

8. The method as claimed In claim 1 wherein step (a) obtains a plurality of digital images corresponding to a plurality 
of irnages and steps (b)-(d) are perfomied on each of the digital Images such that an assessment Is performed for 
each of the images. 

9. The method as claimed In claim 8 wherein the assessments are ranked such that one image with the highest rank 
Is selected as an emphasis Image. 

1 0. The method as claimed in claim 1 wherein the assessment of the image Is used in an automatic albuming algorithm 
to an^nge a page layout of Images. ^ 
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